Fast iscovery of Sim
نویسنده
چکیده
The recent emergence of data mining as a major application of machine learning has led to increased interest in fast rule induction algorithms. These are able to efficiently process large numbers of examples, under the constraint of still achieving good accuracy. If e is the number of examples, many rule learners have O(e4) asymptotic time complexity in noisy domains, and C4.5RULES has been empirically observed to sometimes require O(e3) time. Recent advances have brought this bound down to O(elog’e), while maintaining accuracy at the level of C4.5RULES’s (Cohen 1995). Ideally, we would like to have an algorithm capable of inducing accurate rules in time linear in e, without becoming too expensive in other factors. This extended abstract presents such an algorithm. Most rule induction algorithms employ a “separate and conquer” method, inducing each rule to its full length before going on to the next one. They also evaluate each rule by itself, without regard to the effect of other rules. This is a potentially inefficient approach: rules may be grown further than they need to be, only to be pruned back afterwards, when the whole rule set has already been induced. An alternative is to interleave the construction of all rules, evaluating each rule in the context of the current rule set. This can be termed a “conquering without separating” approach, by contrast with the earlier method, and has been implemented in the CWS algorithm. CWS is outlined in pseudo-code in Table 1. All examples are initially assigned to the majority class. Each rule in CWS is associated with a vector of class probabilities computed from the examples it covers, and predicts the most probable class. Conflicts are resolved by summing the probabilities for all rules covering the test instance, and choosing the class with the highest sum. Acc(RS) is the accuracy of the rule set RS on the training set. This procedure would not be efficient if implemented directly, but, by avoiding the extensive redundancy present in the repeated computation of accuracies and class probabilities, the worstcase time complexity of CWS can be made linear in e and all other relevant parameters. CWS has been extensively evaluated using benchmark problems, a large artificial dataset, and a detailed
منابع مشابه
Scanning impedance microscopy (SIM): A novel approach for AC transport imaging
Scanning Impedance Microscopy (SIM) is one of the novel scanning probe microscopy (SPM) techniques, which has been developed to taking image from sample surface, providing quantitative information with high lateral resolution on the interface capacitance, and investigating the local capacitance–voltage (C–V) behavior of the interface and AC transport properties. The SIM is an ordinary AFM equip...
متن کاملHigh Level Synthesis from Sim-nML Processor Models
The design of modern complex embedded systems require a high level of abstraction of the design. The SimnML[1] is a specification language to model processors for such designs. Several software generation tools have been developed that take ISA specifications in Sim-nML as input. In this paper we present a tool Sim-HS that implements high level behavioral and structural synthesis of processors ...
متن کاملScanning impedance microscopy (SIM): A novel approach for AC transport imaging
Scanning Impedance Microscopy (SIM) is one of the novel scanning probe microscopy (SPM) techniques, which has been developed to taking image from sample surface, providing quantitative information with high lateral resolution on the interface capacitance, and investigating the local capacitance–voltage (C–V) behavior of the interface and AC transport properties. The SIM is an ordinary AFM equip...
متن کاملSuper-resolution Imaging of the Cytokinetic Z Ring in Live Bacteria Using Fast 3D-Structured Illumination Microscopy (f3D-SIM)
Imaging of biological samples using fluorescence microscopy has advanced substantially with new technologies to overcome the resolution barrier of the diffraction of light allowing super-resolution of live samples. There are currently three main types of super-resolution techniques - stimulated emission depletion (STED), single-molecule localization microscopy (including techniques such as PALM...
متن کاملLinear Functions Preserving Multivariate and Directional Majorization
Let V and W be two real vector spaces and let &sim be a relation on both V and W. A linear function T : V → W is said to be a linear preserver (respectively strong linear preserver) of &sim if Tx &sim Ty whenever x &sim y (respectively Tx &sim Ty if and only if x &sim y). In this paper we characterize all linear functions T : M_{n,m} → M_{n,k} which preserve or strongly preserve multivariate an...
متن کامل